巴西专利BR102018068925A2 TECHNIQUES TO CORRECT THE VIENUS OF LANGUAGE TRAINING IN TRAINING DATA

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
“Techniques for correcting language training bias in training data” in automated assistant systems, a deep learning model in the form of a long-term memory classifier (lstm) is used to map questions to classes, with each class having a manually healed response. A team of experts manually creates the training data used to train this classifier. relying on human curation often results in such language training bias creeping into training data, since each individual has a specific style of writing a natural language and uses a few words only in a specific context. deep models eventually learn this bias rather than the main concept words of the target classes. To correct this bias, meaningful sentences are automatically generated using a generative model and then used to train a classification model. For example, a variational auto coder (vae) is used as a generative model to generate new sentences and a language model (lm) is used to select sentences based on probability.
公开号:BR102018068925A2
申请号:R102018068925-8
申请日:2018-09-18
公开日:2019-05-28
发明作者:Puneet Agarwal；Mayur PATIDAR；Lovekesh Vig；Gautam Shroff
申请人:Tata Consultancy Services Limited；
IPC主号:

专利说明:

“TECHNIQUES TO CORRECT LINGUISTIC TRAINING BIAS IN TRAINING DATA”
DESCRIPTION
Priority Claim [001] The present application claims the priority of: Indian Patent Application No. 201721033035, filed on September 18, 2017. All the content of the application mentioned above is hereby incorporated by reference.
Technical Field [002] The modalities in the present application generally refer to training data and, more particularly, to techniques for correcting the linguistic training bias in training data.
State of the Art [003] In recent years, an automated assistance system has been implemented in multinational organizations to answer frequently asked questions (FAQs) from employees. The system is based on a long-term memory classifier (LSTM) that is trained in a body of questions and answers carefully prepared by a small team of domain experts. However, the language training bias is inserted in training data created manually due to specific phrases being used, with little or no variation, which distorts the deep learning classifier in relation to incorrect characteristics. For example, the question "when is my medical license credited " Can be classified in a category related to "Adoption License", resulting in a completely irrelevant answer. This is mainly because the words involving 'sick leave' in the consultation occurred more frequently in the training data for 'Adoption Leave'. As a result, if such words occur in users' queries, the model can ignore other important words (such as' sick leave ') and classify the query in an incorrect class, based on those words.
Petition 870180152146, of 11/16/2018, p. 5/30
2/20
In addition, the FAQs, as provided by the instructors, are often in fact incomplete, and the transfer of linguistic variations between pairs of questions and answers can reveal new classes of questions for which answers are missing. In addition, relying on human curation can result in such language training biases that enter training data, since each individual has a specific style of writing natural language and uses some words only in a specific context.
SUMMARY [004] The following is a simplified summary of some modalities of disclosure, in order to provide a basic understanding of the modalities. This summary is not a comprehensive overview of the modalities. It is not intended to identify key / critical elements of the modalities or to outline the scope of the modalities. Its sole purpose is to present some modalities in a simplified way as a prelude to the more detailed description that is presented below.
[005] In view of the above, a modality in the present application provides methods and systems to correct bias in language training in training data. In one aspect, a processor-implemented method includes steps to: receive a query from a user; generate a set of queries associated with the query received using a long-term variational autocoder (LSTM-VAE) at a time of inference, in which the LSTM-VAE is trained using a weighted cost annealing technique; discard one or more queries comprising consecutive repetitive words from the generated query set to create a subset of the generated queries; select one or more queries from the subset of queries generated based on probability using a language model trained on a first set of training data, in which the one or more selected queries are consistent with predefined data; classify one or more queries selected as queries that exist in the first set
Petition 870180152146, of 11/16/2018, p. 6/30
3/20 of training data and as new queries using a first classifier model; expand the first training data set with the new queries to obtain a second training data set; and training a second classifier model using the second set of training data, thus correcting the linguistic training bias in the training data.
[006] In another aspect, a system is provided to correct the language training bias in the training data. The system includes one or more memories; and one or more hardware processors, to one or more memories coupled to one or more hardware processors in which the one or more hardware processors are able to execute programmed instructions stored in one or more memories to: receive a query from a user ; generate a set of queries associated with the query received using a long-term variational autocoder (LSTM-VAE) at a time of inference, in which the LSTM-VAE is trained using a weighted cost annealing technique; discard one or more queries comprising consecutive repetitive words from the generated query set to create a subset of the generated queries; select one or more queries from the subset of queries generated based on probability using a language model trained on a first set of training data, in which the one or more selected queries are consistent with predefined data; classify one or more selected queries as queries that exist in the first set of training data and new queries using a first classifier model; expand the first training data set with the new queries to obtain a second training data set; and training a second classifier model using the second set of training data, thus correcting the linguistic training bias in the training data.
[007] In yet another aspect, a readable medium by
Petition 870180152146, of 11/16/2018, p. 7/30
4/20 non-transitory computer having incorporated a computer program to execute a method to correct the language training bias in the training data. The method includes steps to: receive a consultation from a user; generate a set of queries associated with the query received using a long-term variational autocoder (LSTM-VAE) at a time of inference, in which the LSTM-VAE is trained using a weighted cost annealing technique; discard one or more queries comprising consecutive repetitive words from the generated query set to create a subset of the generated queries; select one or more queries from the subset of queries generated based on probability using a language model trained on a first set of training data; classify one or more selected queries as queries that exist in the first set of training data and new queries using a first classifier model; expand the first training data set with the new queries to obtain a second training data set; and training a second classifier model using the second set of training data, thus correcting the linguistic training bias in the training data.
[008] It should be appreciated by those skilled in the art that any block diagram in this application represents conceptual views of illustrative systems that incorporate the principles of this matter. Likewise, it is appreciated that any flowcharts, flow diagrams, state transition diagrams, pseudocode and the like represent various processes that can be substantially represented in a computer-readable medium and thus executed by a processor or computing device, either or such a processor or computing device is not explicitly shown.
BRIEF DESCRIPTION OF THE DRAWINGS [009] The detailed description is described with reference
Petition 870180152146, of 11/16/2018, p. 8/30
5/20 to the attached Figures. In the Figures, the leftmost digit (s) of a reference number identifies the Figure in which the reference number appears for the first time. The same numbers are used in all drawings to refer to similar features and modules.
[0010] Figure 1 illustrates a block diagram of a system for correcting linguistic training bias in training data, according to an exemplary modality.
[0011] Figure 2 illustrates a short-term memory (LSTM) variational autocoder (VAE) architecture, according to an exemplary modality.
[0012] Figure 3 illustrates a graph that represents the loss of KL divergence from the training stages, according to an exemplary modality.
[0013] Figure 4 illustrates a table that includes new queries generated by LSTM VAE, according to an exemplary modality.
[0014] Figure 5 illustrates a flow diagram of a method to correct language training bias in training data, according to an exemplary modality.
[0015] Figure 6 illustrates the stages of a query generation process flow, according to an exemplary modality.
[0016] It should be appreciated by those skilled in the art that any block diagrams in this application represent conceptual views of illustrative systems and devices that incorporate the principles of this matter. Likewise, it will be appreciated that any flowcharts, flow diagrams and the like represent various processes that can be represented substantially in a computer-readable medium and thus executed by a computer or processor, whether or not that computer or processor is explicitly shown .
Petition 870180152146, of 11/16/2018, p. 9/30
6/20
DETAILED DESCRIPTION OF THE MODALITIES [0017] The modalities in this application and the various characteristics and advantageous details of the same are explained more fully with reference to the non-limiting modalities that are illustrated in the attached drawings and detailed in the following description. The examples used in the present application are only intended to facilitate an understanding of the ways in which the modalities of the present invention can be practiced and to allow those skilled in the art to practice the modalities in the present application. Consequently, the examples should not be interpreted as limiting the scope of the modalities in this application.
[0018] The subject matter in the present application provides a system and method to correct the bias of linguistic training in the training data, according to an exemplary modality. The current story automatically generates meaningful phrases using a generative model and then uses them to train a classification model after proper annotation. In the current subject, a variational autocoder (VAE), trained using a weighted cost annealing technique, is used as the generative model for generating new sentences and uses a language model (LM) for the selection of sentences based on probability . The VAE is modeled using RNNs comprising LSTM units. The LSTM - VAE can be used to automatically generate linguistically new questions, which, (a) corrects the classifier bias when extended to the training data, (b) reveals incompleteness in the set of responses and (c) improves accuracy and the generalization skills of the base LSTM classifier, allowing it to learn from smaller training data. The new questions sometimes belonged to completely new classes that were not present in the original training data.
[0019] The methods and systems are not limited to the specific modalities described in this application. In addition, the method and
Petition 870180152146, of 11/16/2018, p. 10/30
7/20 the system can be practiced independently and separately from other modules and methods described in this application. Each device / method element / module can be used in combination with other elements / modules and other methods.
[0020] The manner in which the system and method for correcting the language training bias in the training data has been explained in detail in relation to Figures 1 to 6. Although aspects of the methods and systems described to correct the training bias language in the training data can be implemented in any number of different systems, utility environments and / or configurations, the modalities are described in the context of the following example (s) system (s).
[0021] Figure 1 illustrates a block diagram of a system 100 to correct language training bias in training data, according to an exemplary modality. In an exemplary embodiment, system 100 can be realized in, or is in direct communication with, a computing device. System 100 includes or is in communication with one or more hardware processors, such as processor (s) 102, one or more memories, such as a memory 104 and a network interface unit, such as a network interface unit 106. In one embodiment, processor 102, memory 104 and network interface unit 106 can be coupled by a system bus, such as a system bus or a similar mechanism. Although Figure 1 shows examples of components of system 100, in other implementations, system 100 may contain fewer components, additional components, different components or components arranged differently than shown in Figure 1.
[0022] Processor 102 may include a set of circuits that implement, among others, associated audio and logic functions
Petition 870180152146, of 11/16/2018, p. 11/30
8/20 to communication. For example, processor 102 may include, but is not limited to, one or more digital signal processors (DSPs), one or more microprocessors, one or more special purpose computer chips, one or more field programmable port arrangements (FPGAs), one or more application-specific integrated circuits (ASICs), one or more computer (s), several analog-to-digital converters, digital-to-analog converters and / or other support circuits. Processor 102 may therefore also include functionality to encode messages and / or data or information. Processor 102 may include, among other things, a clock, an arithmetic logic unit (ALU), and logic ports configured to support the operation of processor 102. In addition, processor 102 may include functionality to run one or more software programs, which can be stored in memory 104 or otherwise accessible to processor 102.
[0023] The functions of the various elements shown in the Figure, including any function blocks labeled "processor (s)", can be provided through the use of dedicated hardware, as well as hardware capable of running software in association with appropriate software. When provided by a processor, functions can be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which can be shared. In addition, the explicit use of the term “processor” should not be interpreted as referring exclusively to hardware capable of running software and may implicitly include, without limitation, DSP hardware, network processor, application specific integrated circuit (ASIC), FPGA, read-only memory (ROM) for software storage, random access memory (RAM) and non-volatile storage. Other hardware, conventional and / or custom, can also be included.
[0024] The interface (s) 106 may include a variety of software and hardware interfaces, for example, interfaces for
Petition 870180152146, of 11/16/2018, p. 12/30
9/20 peripheral device (s), such as keyboard, mouse, external memory and printer. The interface (s) 106 can facilitate multiple communications over a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks , such as wireless LAN (WLAN), cellular or satellite.
[0025] One or more memories, such as a memory 104, can store any number of pieces of information and data used by the system to implement the functions of the system. Memory 104 may include, for example, volatile memory and / or non-volatile memory. Examples of volatile memory may include, but are not limited to, volatile random access memory. The non-volatile memory can, additionally or alternatively, comprise an electrically erasable programmable read-only memory (EEPROM), flash memory, hard disk or the like. Some examples of volatile memory include, but are not limited to, random access memory, dynamic random access memory, static random access memory and the like. Some examples of non-volatile memory include, but are not limited to, hard drives, magnetic tapes, optical disks, programmable read-only memory, erasable programmable read-only memory, electrically erasable programmable read-only memory, flash memory and the like. The memory 104 can be configured to store information, data, applications, instructions or the like to allow the system 100 to perform various functions according to various exemplary modalities. In addition or alternatively, memory 104 can be configured to store instructions that, when executed by processor 102, cause the system to behave in the manner described in various modalities. Memory 104 includes a training bias correction module 108 and other modules. Module 108 and other modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. The other modules can
Petition 870180152146, of 11/16/2018, p. 13/30
10/20 include programs or coded instructions that supplement applications and system functions 100.
[0026] In operation, system 100 receives a query from a user. For example, system 100 includes a FAQ bot that receives a query from the user. In one example, a frequently asked question dataset for building the bot includes semantically similar question sets Q, = {q _1t ..., q _n .} And their corresponding answer to _i . A set of such questions Q _i and answer corresponding to _i are collectively referred to as a set of queries s _i = {Q _i , a _i }. The questions in the query set s _i are represented as Q _i = Q (s _i ). Data set D is assumed to comprise many of these query sets, that is, D = {s ₁ ... s _m }. In the implementation of the chatbot, given the query q of a user, the objective is to select the corresponding query set s by a multi-class classification model, so that the corresponding answer is shown.
[0027] In addition, training bias correction module 108 generates a set of queries associated with the query received using a long-term variational autocoder (LSTMVAE) at a time of interference. Given all the questions in the training data D, Q = UQ (s _i ), Vs, and D the training bias correction module 108 generates new Q 'questions using the LSTM-VAE. Some of the questions in Q 'are semantically similar to one of D's query sets, while the remaining questions do not belong to any of the existing query sets.
[0028] For example, VAE is a generative model that, unlike sequence autocoders, is composed of a probabilistic encoder ^ _φ (ζ | χ), recognition model) and a decoder (pe (x | z), generative model). The posterior distribution pe (z | x) is known to be computationally intractable. In this example, a network
Petition 870180152146, of 11/16/2018, p. 14/30
11/20 single-layer recurrent neural (RNN) with LSTM units is used as the VAE encoder and decoder. Initially, a variable length input query is passed in the encoder in reverse order, as shown in an architecture 200 in Figure 2 The words in a query are first converted into a vector representation after going through the word embedding layer, before be fed to the LSTM layer. The final hidden state of the LSTM, h _and then, passes through a direct feed layer, which predicts μ and σ of the posterior distribution q <^ (z | x). After sampling and via a reparametrization trick, the sampled encoding z passes through a direct feed layer to obtain h _do , which is the initial state of the RNN decoder. In addition, z-coding is passed as input to the LSTM layer every time. The vector representation of the word with the highest probability, predicted at time t, is also passed as an entry in the next time interval (t + 1), as shown in Figure 2 This helps LSTM-VAE to generate more queries compared to those generated by entering the real word in the decoder.
[0029] In an exemplary implementation, LSTMVAE is trained using a weighted cost annealing technique. For example, the weighted cost annealing technique increases a Kullback-Leibler (KL) loss of divergence linearity weight after predefined periods and simultaneously reduces a reconstruction loss weight. In this example, the weighting of the loss of KL divergence increases linearly after each and periods and simultaneously reduces the weighting of the reconstruction loss. Because of this, even if the loss of KL divergence initially increases for some time intervals, it starts to decrease over time intervals, but remains different from zero. This is shown in a graph 300 in Figure 3.
[0030] In this example implementation, it is used
Petition 870180152146, of 11/16/2018, p. 15/30
12/20 a weighted loss function as mentioned below in equation 1 and started training the model with λ = 0, keeping it fixed for the first and periods, that is, λ (0 - e) = 0. In addition, λ is increased by r after each and periods, that is, λ (e-2e) = λ (0-e) + r. Here, e and r are assumed as hyper parameters. For example, the tuning intervals for e are [5, 10, 15] and r is [, 1,, 05 and, 025].
£ ⁽ φ, θ, x ⁾ = λ. KL (9ψ (ζ | χ) | ρβ (ζ)} - ⁽ 1 - 2). Eq0 (zW (logp0 (x | z)) (1) [0031] In this example implementation, z is passed at each stage of the LSTM decoder with the highest probability word taken from the predicted distribution, ie greedy decoding w _t = argmax _Wt p (w _t w ₀ , _t-1 , h _d , z) .To make the decoder depend more z during the decoding of the sentence, a word abandonment is used to pass a <UNK> token to the next step instead of the word provided by the decoder using decoding greedy. during the decoding of a sentence using z, k, the fraction of words it is replaced randomly by <UNK> tokens, where k [0,1] is also taken as a hyper-parameter.
[0032] To generate phrases similar to the input phrases, e is sampled and z is obtained using equation 2, which is a continuous and therefore differentiable function. For example, the dimensions of z are [20, 30, 50]. These sampled encodings are decoded by the generative model using greedy decoding to obtain the sentences.
z = μ + ε.σ, οηάβ ε ~ Ν (0.1 ') (2) [0033] In addition, training bias correction module 108 discards one or more queries comprising consecutive repetition words from the set of generated queries to create a subset of the generated queries. In addition, the training bias correction module 108 selects one or more queries from the subset of the queries generated based on the probability for a language model trained in
Petition 870180152146, of 11/16/2018, p. 16/30
13/20 a first set of training data. The one or more selected queries are consistent with predefined data. For example, predefined data includes queries generated by experts in the specific domain. In an exemplary implementation, the training bias correction module 108 learns conditional probability distribution over vocabulary words in the subset of queries generated based on the probability by the language model. In addition, the training bias correction module 108 selects queries from the generated subset of queries based on the conditional probability distribution learned about the vocabulary words.
[0034] In one example, an RNN language model (RNNLM) is a generative model that learns the conditional probability distribution over vocabulary words. It predicts the next word (w _{i + 1} ), given the representation of words seen so far h _i and current input w _i , by maximizing the log probability of the next word p (w _{i + 1} | h _i , w _i ) = Softmax (W _s h _i + b _s ), average over the length of sequence N. Generally, the performance of the RNNLM is measured using perplexity (less is better), Perplexity = exp ^LcE - ^LM . The loss of cross entropy is used to train the language model.
^ ce_lm = - ¹ E ^ ilogCípO + ilàí, Wi)) [0035] In addition, the training bias correction module 108 classifies those selected as selected queries that exist in the first set of training data using a first classifier model (i.e., a deep learning classifier) or as new queries based on manual labeling. For example, the first classifier model (M1) is a single-layer recurrent neural network with LSTM units for classification trained on the first training data set. This is used as a baseline for classification. Classification can generally be thought of as a two-step process
Petition 870180152146, of 11/16/2018, p. 17/30
14/20 steps, with the first step requiring a representation of the data. The second step involves using this representation for classification. Data can be represented using a word bag approach, which ignores word order information, or using hand-made features, which fail to generalize to multiple sets of data / tasks. We learned task-specific sentence representation using RNNs with LSTM units when representing the variable-length sentence in a fixed-length vector representation h, obtained after passing the sentence through the RNN layer. Softmax is then applied over the related transformation of h, that is, p (c | h) = Softmax (W _s h + b _s ). To learn the weightings of the model above, the loss of categorical cross entropy is minimized, that is, ^ CE = ~ Z ^ iy-log (p (Ciih)) where c _i is one of the mey class and 1 is only for the alve class and zero otherwise.
[0036] In an exemplary implementation, the training bias correction module 108 selects one or more of the new queries (top k) that are correctly classified by the first classifier model based on an entropy of a Softmax distribution function. In this example implementation, to obtain a label for the new questions generated by the VAE, the training bias correction module 108 uses M1 and chooses the top K phrases, based on the entropy of the softmax distribution, as candidates to increase the training data . In addition, the training bias correction module 108 allows the user to identify selected queries that are incorrectly classified by the first classifier model. In one embodiment, the training bias correction module 108 allows the user to check the label and correct the label if it is incorrectly classified by M1. In addition, the training bias correction module 108 removes questions that clearly correspond to new classes.
Petition 870180152146, of 11/16/2018, p. 18/30
15/20 [0037] In addition, training bias correction module 108 expands the first set of training data with new top k entries correctly classified by the first classifier model (M1) and queries that are erroneously classified by the first model classifier to obtain a second set of training data. In addition, the training bias correction module 108 trains a second classifier model using the second set of training data, thus correcting the linguistic training bias in the training data. The queries generated by the LSTM-VAE include new classes of questions for the FAQ-chatbot, not present in the first training data, which are reviewed and accepted by the domain experts for deployment.
[0038] Figure 5 illustrates a flow diagram of a method to correct language training bias in training data, according to an exemplary modality. The method implemented by processor 500 can be described in the general context of computer executable instructions. Generally, computer-executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform specific functions or implement certain types of abstract data. The 500 method can also be practiced in a distributed computing environment, where the functions are performed by remote processing devices that are connected by a communication network. The order in which method 500 is described is not intended to be construed as a limitation, and any number of blocks of the method described can be combined in any order to implement method 500, or an alternative method. In addition, method 500 can be implemented on any hardware, software, firmware or combination thereof. In one embodiment, method 500 represented in the flowchart can be performed by a system, for example, system 100 in Figure 1
Petition 870180152146, of 11/16/2018, p. 19/30
16/20 [0039] In block 502, a query is received from a user. In block 504, a set of queries associated with the received query is generated using a long-term variational autocoder (LSTM-VAE) at a time of inference, the LSTM-VAE is trained using a weighted cost annealing technique. For example, the weighted cost annealing technique increases a Kullback-Leibler (KL) loss of divergence linearity weight after predefined periods and simultaneously reduces a reconstruction loss weight. In block 506, one or more queries, including consecutively repeated words, are discarded from the generated query set to create a subset of the generated queries. In block 508, queries are selected from the subset of queries generated based on probability using a language model trained on a first set of training data, where the one or more selected queries are consistent with predefined data. For example, predefined data includes queries generated by experts in the specific domain. In an exemplary modality, the conditional probability distribution is learned about vocabulary words in the subset of queries generated based on the probability through the language model. In addition, queries are generated from the subset of queries generated based on the conditional probability distribution learned about the vocabulary words.
[0040] In block 510, the selected queries are classified as queries that exist in the first set of training data or as new queries using a first classifier model. In one example, the first classifier model is a single-layer recurrent neural network with LSTM units for classification trained on the first training data set. In block 512, the first training data set is expanded with the new queries to obtain a second training data set. In a
Petition 870180152146, of 11/16/2018, p. 20/30
17/20 exemplary modality, one or more of the new queries that are correctly classified by the first classifier model are selected based on an entropy of a softmax distribution function. In addition, the first training data set is expanded with one or more of the new queries that are correctly classified by the first classifier model. In some modalities, the user is able to identify the selected queries that are classified incorrectly by the first classifying model. In addition, the second set of training data is expanded with queries that are erroneously classified by the first classifier model. In block 516, a second classifier model is trained using the second set of training data, thus correcting the linguistic training bias in the training data.
[0041] Figure 6 illustrates the 600 steps of a query generation process flow, according to an exemplary modality. Figure 6 illustrates the entire workflow followed to generate the new queries. As shown in Figure 6, of the 175,000 queries generated using LSTM-VAE in block 602, the queries already present in the training data, as well as those that have the same word repeating more than once consecutively, are removed in block 604. After this process, about 5,700 consultations are obtained. These queries are then tested using an LM, and only the top 1500 phrases are selected based on the probability in block 606. Many of these phrases are grammatically correct, while only a few are semantically inconsistent. In block 608, 1066 queries are selected based on manual labeling. In this process, 434 sentences did not belong to any of the existing classes. These phrases are given to experts for review, and they select 120 phrases from 33 new classes. In block 610, a classifier (M1) classifies the 1066 queries as queries that already exist in the original query set and new queries. In block 612, queries erroneously classified and queries correctly classified top k are
Petition 870180152146, of 11/16/2018, p. 21/30
18/20 identified and extended to the original training data to obtain new training data. In addition, a new classifier is trained using the new training data, thus correcting the language training bias in the training data.
[0042] The various modalities described in Figures 1-6 propose an approach to a generative model, which uses LSTM-VAE followed by sentence selection using an LM to correct language training bias in training data. In this approach, the cost-weighted annealing technique is used to train the LSTM-VAE. When these phrases are added to the training set, it indirectly forces the model to learn to distinguish classes based on a few other words in addition to those non-conceptual words. Therefore, expanding training data with automatically generated phrases is able to correct over-adaptation due to language training bias. The new phrases generated sometimes belonged to completely new classes that were not present in the original training data. In addition, expanding training data with automatically generated phrases results in better accuracy (2%) of the deep learning classifier.
[0043] The written description describes the subject of this application to allow anyone skilled in the art to make and use the modalities. The scope of the modalities of the matter is defined by the claims and may include other changes that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims, if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with non-substantial differences from the literal language of the claims.
[0044] However, it should be understood that the scope of protection is extended to such a program and beyond a computer-readable medium that has a message in it; such non-transitory computer-readable storage media contains program code media
Petition 870180152146, of 11/16/2018, p. 22/30
19/20 for implementing one or more steps of the method, when the program is executed on a server or mobile device or any suitable programmable device. The hardware device can be any type of device that can be programmed including, for example, any type of computer, such as a server or a personal computer, or the like, or any combination of these. The device may also include means that could be, for example, hardware means such as, for example, an application specific integrated circuit (ASIC), an array of field programmable ports (FPGA), or a combination of hardware and software, for example, an ASIC and an FPGA, or at least a microprocessor and at least one memory with software modules located in it. Therefore, the means can include both the hardware means and the software means. The method modalities described in the present application can be implemented in hardware and software. The device may also include software media. Alternatively, the modalities can be implemented on different hardware devices, for example, using a plurality of CPUs.
[0045] The modalities described in this application may include hardware and software elements. The modalities that are implemented in software include, but are limited to, firmware, resident software, microcode, etc. The functions performed by the various modules described in this application can be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer-readable medium can be any device that can understand, store, communicate, propagate or transport the program for use by or in connection with the instruction system, device or device.
[0046] The description mentioned above of the specific implementations and modalities will reveal so completely the general nature of the implementations and modalities of this application that
Petition 870180152146, of 11/16/2018, p. 23/30
20/20 others may, when applying current knowledge, promptly modify and / or adapt such specific modalities for various applications without departing from the generic concept, and, therefore, such adaptations and modifications must and are intended to be understood within the meaning and reach of the equivalents of the disclosed modalities. It should be understood that the phraseology or terminology used in this application is for the purpose of description and not limitation. Therefore, although the modalities described in this application have been described in terms of preferred modalities, those skilled in the art will recognize that the modalities of this application can be practiced with modification within the spirit and scope of the modalities described in this application.
[0047] The previous description was presented with reference to several modalities. People with ordinary skill in the technique and technology to which this order belongs will appreciate that changes and changes in the described structures and methods of operation can be practiced without significantly departing from the principle, spirit and scope.

权利要求:
Claims (15)
[1]
1. Method implemented by the processor (500) characterized by the fact that it comprises:
receive, by one or more processors, a query from a user (502);
generate a set, by one or more processors, of queries associated with the query received using a short-term memory variation auto-encoder (LSTM-VAE) at a time of inference, in which the LSTM-VAE is trained using a technique annealing weight cost (504);
discard, by one or more processors, one or more queries comprising consecutive repeated words from the set of generated queries to create a subset of the generated queries (506);
select, by one or more processors, one or more queries from the subset of queries generated based on probability through a language model trained on a first set of training data, where one or more selected queries are consistent with predefined data ( 508);
classify, by one or more processors, one or more queries selected as queries that exist in the first set of training data and new queries using a first classifier model (510);
increase, by one or more processors, the first set of training data with the new queries to obtain a second set of training data (512); and training, by one or more processors, a second classifier model using the second set of training data, thus correcting the linguistic training bias in the training data (514).
[2]
2. Method according to claim 1,
Petition 870180131560, of 09/18/2018, p. 10/16
2/6 characterized by the fact that the weight cost annealing technique linearly increases a Kullback-Leibler (KL) loss weight after predefined times.
[3]
3. Method according to claim 1, characterized by the fact that the selection of one or more queries from the subset of queries generated based on probability through the language model trained in the first set of training data, comprises:
learn conditional probability distribution on vocabulary words in the subset of queries generated based on probability through the language model; and select one or more queries from the subset of the queries generated based on the conditional probability distribution learned about the vocabulary words.
[4]
4. Method according to claim 1, characterized by the fact that the first classifier model is a single layer recurrent neural network with LSTM units trained in the first set of training data for classification.
[5]
5. Method according to claim 1, characterized by the fact that the increase in the first set of training data with the new consultations, comprises:
select one or more of the new queries that are correctly classified by the first classifier model based on an entropy of a softmax distribution function; and augment the first set of training data with one or more new queries that are classified correctly by the first classifier model.
[6]
6. Method according to claim 1, characterized in that it additionally comprises:
allow the user, through one or more processors,
Petition 870180131560, of 09/18/2018, p. 11/16
3/6 identify one or more queries that are erroneously classified by the first classifier model.
[7]
7. Method according to claim 6, characterized by the fact that the increase in the first set of training data, comprises:
increase the first training data set with new queries and queries that are erroneously classified by the first classifier model to obtain the second training data set.
[8]
8. System (100) characterized by the fact that it comprises:
one or more memories (104); and one or more hardware processors (102), one or more memories coupled to one or more hardware processors, wherein one or more hardware processors are capable of executing programmed instructions stored in one or more memories to:
receiving a query from a user;
generation of a set of queries associated with the received query using a short-term memory variation auto-encoder (LSTM-VAE) at a time of inference, in which the LSTMVAE is trained using a weight cost annealing technique;
discard one or more queries comprising consecutive repeated words from the generated query set to create a subset of the generated queries;
selecting one or more queries from the subset of queries generated based on probability using a language model trained on a first set of training data, where one or more selected queries are consistent with predefined data;
classification of one or more selected queries as queries that exist in the first set of training data and
Petition 870180131560, of 09/18/2018, p. 12/16
4/6 new queries using a first classifier model;
increasing the first training data set with the new queries to obtain a second training data set; and training of a second classifier model using the second set of training data, thus correcting the linguistic training bias in the training data.
[9]
9. System according to claim 8, characterized by the fact that the weight cost annealing technique linearly increases a Kullback-Leibler (KL) loss weight after predefined times.
[10]
10. System according to claim 8, characterized by the fact that one or more hardware processors are capable of executing instructions programmed to:
conditional probability distribution learning about vocabulary words in the subset of queries generated based on probability through the language model; and selecting one or more queries from the subset of the queries generated based on the conditional probability distribution learned about the vocabulary words.
[11]
11. System according to claim 8, characterized by the fact that the first classifier model is a single layer recurrent neural network with LSTM units for classification trained on the first training data set.
[12]
12. System according to claim 8, characterized by the fact that one or more hardware processors are capable of executing programmed instructions for:
selecting one or more of the new queries that are correctly classified by the first classifier model based on an entropy of a softmax distribution function; and
Petition 870180131560, of 09/18/2018, p. 13/16
5/6 increase of the first training data set with one or more new queries that are correctly classified by the first classifier model.
[13]
13. System according to claim 8, characterized by the fact that one or more hardware processors are still capable of executing programmed instructions for:
user permission to identify one or more queries that are erroneously classified by the first classifier model.
[14]
14. System according to claim 13, characterized by the fact that one or more hardware processors are capable of executing programmed instructions for:
increase of the first training data set with new queries and queries that are erroneously classified by the first classifier model to obtain the second training data set.
[15]
15. Computer program product characterized by the fact that it comprises a non-transitory computer-readable medium having a computer-readable program incorporated in it, in which the computer-readable program, when executed on a computing device, causes the computing:
receive a query from a user;
generate a set of queries associated with the received query using a short-term memory variation auto-encoder (LSTM-VAE) at a time of inference, in which the LSTM-VAE is trained using a weight cost annealing technique;
discard one or more queries comprising consecutive repeated words from the generated query set to create a subset of the generated queries;
selecting one or more queries from the subset of
Petition 870180131560, of 09/18/2018, p. 14/16
6/6 queries generated based on probability through a language model trained on a first set of training data, in which one or more selected queries are consistent with predefined data;
classification of one or more selected queries as queries that exist in the first set of training data and new queries using a first classifier model;
increasing the first training data set with the new queries to obtain a second training data set; and train a second classifier model using the second set of training data, thereby correcting the language training bias in the training data.

类似技术:

公开号 | 公开日 | 专利标题

BR102018068925A2|2019-05-28|TECHNIQUES TO CORRECT THE VIENUS OF LANGUAGE TRAINING IN TRAINING DATA

Yin et al.2015|Neural enquirer: Learning to query tables with natural language

JP2019537096A|2019-12-19|Neural machine translation system

Li2012|Robust logitboost and adaptive base class | logitboost

BR102018004799A2|2019-03-26|METHOD IMPLEMENTED BY A PROCESSOR AND BIDIRED-SHORT-TERM SIAMESE SYNCHESE NETWORK CLASSIFIER SYSTEM

Norton2014|A material dissolution of the problem of induction

Collobert2011|Deep learning for efficient discriminative parsing

He et al.2017|Decoding with value networks for neural machine translation

BR112019014822A2|2020-02-27|NEURAL NETWORKS FOR ATTENTION-BASED SEQUENCE TRANSDUCTION

US7451125B2|2008-11-11|System and method for compiling rules created by machine learning program

Braud et al.2016|Multi-view and multi-task training of RST discourse parsers

Zorzi2010|The connectionist dual process | approach to modelling reading aloud

CN107657313B|2021-05-18|System and method for transfer learning of natural language processing task based on field adaptation

Wang et al.2018|The APVA-TURBO approach to question answering in knowledge base

Stroh et al.2016|Question answering using deep learning

Ziegler et al.2020|Learning to read and dyslexia: From theory to intervention through personalized computational models

BR102019014330A2|2020-01-28|resolution of abstract anaphoric references in conversational systems using hierarchically stacked neural networks

Moryossef et al.2019|Improving quality and efficiency in plan-based neural data-to-text generation

CN111967266A|2020-11-20|Chinese named entity recognition model and construction method and application thereof

CN108491380B|2021-11-23|Anti-multitask training method for spoken language understanding

US20200349052A1|2020-11-05|Representing source code in vector space to detect errors

CN112071429A|2020-12-11|Medical automatic question-answering system construction method based on knowledge graph

CN105955955A|2016-09-21|Disambiguation-free unsupervised part-of-speech tagging method based on error-correcting output codes

CN111444715A|2020-07-24|Entity relationship identification method and device, computer equipment and storage medium

CN110909541A|2020-03-24|Instruction generation method, system, device and medium

同族专利:

公开号 | 公开日

AU2018232914A1|2019-04-04|

CA3017655C|2021-04-20|

CA3017655A1|2019-03-18|

JP6606243B2|2019-11-13|

JP2019057280A|2019-04-11|

MX2018011305A|2019-07-04|

US20190087728A1|2019-03-21|

AU2018232914B2|2020-07-02|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US7603330B2|2006-02-01|2009-10-13|Honda Motor Co., Ltd.|Meta learning for question classification|

US10909329B2|2015-05-21|2021-02-02|Baidu Usa Llc|Multilingual image question answering|

JP6618735B2|2015-08-31|2019-12-11|国立研究開発法人情報通信研究機構|Question answering system training apparatus and computer program therefor|CN107832439B|2017-11-16|2019-03-08|百度在线网络技术（北京）有限公司|Method, system and the terminal device of more wheel state trackings|

US11270082B2|2018-08-20|2022-03-08|Verint Americas Inc.|Hybrid natural language understanding|

US11087184B2|2018-09-25|2021-08-10|Nec Corporation|Network reparameterization for new class categorization|

US10909671B2|2018-10-02|2021-02-02|International Business Machines Corporation|Region of interest weighted anomaly detection|

US11217226B2|2018-10-30|2022-01-04|Verint Americas Inc.|System to detect and reduce understanding bias in intelligent virtual assistants|

US10963645B2|2019-02-07|2021-03-30|Sap Se|Bi-directional contextualized text description|

US11003861B2|2019-02-13|2021-05-11|Sap Se|Contextualized text description|

US20200320439A1|2019-04-05|2020-10-08|Samsung Display Co., Ltd.|System and method for data augmentation for trace dataset|

CN110090016B|2019-04-28|2021-06-25|心医国际数字医疗系统有限公司|Method and system for positioning R wave position and R wave automatic detection method using LSTM neural network|

CN110297886A|2019-05-31|2019-10-01|广州大学|OJ topic classifier construction method and topic analogy method based on short text|

CN110647627A|2019-08-06|2020-01-03|北京百度网讯科技有限公司|Answer generation method and device, computer equipment and readable medium|

CN110580289B|2019-08-28|2021-10-29|浙江工业大学|Scientific and technological paper classification method based on stacking automatic encoder and citation network|

US11270080B2|2020-01-15|2022-03-08|International Business Machines Corporation|Unintended bias detection in conversational agent platforms with machine learning model|

CN111738364B|2020-08-05|2021-05-25|国网江西省电力有限公司供电服务管理中心|Electricity stealing detection method based on combination of user load and electricity consumption parameter|

法律状态:
2019-05-28| B03A| Publication of an application: publication of a patent application or of a certificate of addition of invention|

优先权:

申请号 | 申请日 | 专利标题

IN201721033035|2017-09-18|

[返回顶部]